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Statistical Properties of Loss Rate Estimators in 

Tree Topology 
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Abstract 

Three types of explicit estimators are proposed here to estimate the loss rates of the links in a network of the tree topology. 
All of them are derived by the maximum likelihood principle and proved to be either asymptotic unbiased or unbiased. In addition, 
a set of formulae are derived to compute the efficiencies and variances of the estimators that also cover some of the estimators 
proposed previously. The formulae unveil that the variance of the estimates obtained by a maximum likelihood estimator for the 
pass rate of the root link of a multicast tree is equal to the variance of the pass rate of the multicast tree divided by the pass rate 
of the tree connected to the root link. Using the formulae, we are able to evaluate the estimators proposed so far and select an 
estimator for a data set. 

Index Terms 
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I. Introduction 


Network characteristics, such as link-level loss rate, delay distribution, available bandwidth, etc. are valuable information to 
network operations, development and researches. Therefore, a considerable attention has been given to network measurement, 
in particular to large networks that cross a number of autonomous systems, where security concerns, commercial interests, and 
administrative boundary make direct measurement impossible. To overcome the security and administrative obstacles, network 
tomography was proposed in 111], where the author suggests the use of end-to-end measurement and statistical inference to 
estimate the characteristics of interest. Since then, many works have been carried out to estimate various characteristics that 


cover loss tomography y 


1 ill , delay tomography 112 


la, loss pattern tomography fSl], and so on. Despite the enthusiasm in 


loss tomography, there has been little work to study the statistical properties of an estimator with a finite sample size although 
some asymptotic properties are presented in the literature yll^. The finite sample properties, such as efficiency and variance, 
differ from the asymptotic ones that are critical to the performance evaluation of an estimator since each of them unveil the 
quality and effectiveness of an estimator in a specific aspect. Apart from that, the finite sample properties can be used to 
select a better estimator, if not the best, from a group for a data set obtained from a specific circumstance. To fill the gap, we 
in this paper propose a number of maximum likelihood estimators (MLE) that can be solved explicitly for a network of the 
tree topology and provide the statistical properties of them. The statistical properties are further extended to cover the MLEs 
proposed previously. One of the most important discoveries is a set of formulae to compute the efficiency and variance of the 
estimates obtained by an estimator. 
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The approach proposed in 111] requires us to send probing packets, called probes, from some end-nodes called sources to 
the receivers located on the other side of the network, where the paths connecting the sources to the receivers cover the links 
of interest. To make the probes received informative in statistical inference, multicast or unicast-based multicast proposed in 
DSl [iTD is used to send probes from a source to a number of receivers, via a number of intermediate nodes that replicate the 
arrived probes and then forward to its descendants. This process continues until either the probes reach the destinations or 
lost, which makes the observations of any two receivers correlated in some degree and the degrees vary depending on the 
interconnection between the receivers. Given the network topology used for sending probes and the observations obtained at 
receivers, we are able to create a likelihood function to connect the observation to the process described above. Since the 
number of correlations created by multicasting are proportional to the number of descendants attached to a node, the likelihood 
equation obtained for a node having many descendants is a high degree polynomial that requires an iterative procedure, such 
as the expectation and maximization (EM) or the Newton-Raphson algorithm, to approximate the solution. Using iterative 
procedure to solve a polynomial has been widely criticised for its computational complexity that increases with the number of 
descendants attached to the link or path to be estimated (Sl]. There has been a persistent effort in the research community to 
search for explicit estimators that are comparable in terms of accuracy to the estimators using iterative approach. To achieve 
this, we must have the statistical properties of the estimates obtained by an estimator, such as unbiasedness, efficiency, and 
variance. Unfortunately, there has been little work in a general form for the properties and the asymptotic properties obtained 


in II 2 I, ^ has little use in this circumstance. 

To overcome the problems stated above, we have undertaken a thorough and systematic investigation of the estimators 
proposed for loss tomography that aims at identifying the statistical principle and strategies that have been used or can be 
used in the tree topology. A number of findings are obtained in the investigation that show all of the estimators proposed 
previously rely on observed correlations to infer the loss/pass rates and most of them use all of the correlations available in 
estimation, such as the MLE proposed in yt). However, the qualities of the correlations, measured by the fitness between a 
correlation and the corresponding observation, are very much ignored. Rather than using all of the correlations available in 
estimation, we propose here to use a small portion of high-quality ones and expect the estimates obtained by such an estimator 
are comparable to that considering all of the correlations. The investigation further leads to a number of findings that contribute 
to loss tomography in four-fold. 


A large number of explicit estimators are proposed on the basis of composite likelihood OIST that are divided into three 


groups: the block wised estimators (BWE), the reduce scaled estimators (RSE), and the individual based estimators (IBE). 
The estimators in BWE and IBE are proved to be unbiased and that in RSE are proved to be asymptotic unbiased as that 
proved in 0]. A set of formulae are derived for the efficiency and variances of the estimators in RSE and IBE, plus the 
MLE proposed in y]]. The formulae show the variance of the estimates obtained by a MLE can be exactly expressed by 
the pass rate of the path of interest and the pass rate of the subtrees connected to the path. The formulae also show the 
weakness of the result obtained in iB. 

The efficiency of the estimators in IBE are compared with each other on the basis of the Eisher information that shows 
an estimator considering a correlation involving a few observers can be more efficient than that considering more and the 


estimator proposed in 


is the least efficient. A similar conclusion is obtained for the estimators in BWE. 
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• Using the formulae, we able to identify an efficient estimator by examining the end-to-end observation that makes model 
selection not only possible but also feasible. A number of simulations are conducted to verify this feature that also show 
the connection between efficiency and robustness of an estimator. 

The rest of the paper is organised as follows. In Section |II] we briefly introduce the previous works related to explicit loss 
rate estimators and point out the weakness of them. In Section Hn] we introduce the loss model, the notations, and the statistics 
used in this paper. Using the model and statistics, we derive a MLE that considers all available correlations for a network of the 
tree topology in Section |IV] We then decompose the MLE into a number of components according to correlations and derive 
a number of likelihood equations for the components in Section |V] A statistic analysis of the proposed estimators is presented 
in Section |Vl] that details the statistical properties of the proposed estimators, one of them is the formulae to calculate the 
variances of various estimators. Simulation study is presented in Section IVlIl that compares the performance of five estimators 
and shows the feasibility of selecting an estimator for a data set. Section I VIII I is devoted to concluding remark. 


II. Related Works 

Multicast Inference of Network Characters (MINC) is the pioneer of using the ideas proposed in Jll] into practice, where a 
Bernoulli model is used to model the loss behaviors of a path. Using this model, the authors of yj] derive an estimator in the 
form of a polynomial that is one degree less than the number of descendants connected to the end node of the path of interest 
Apart from that, the authors obtain a number of results from asymptotic theory, such as the large number behaviour of 
the estimator and the dependency of the estimator variance on topology. Unfortunately, the results only hold if the sample size 
n grows indefinitely. In addition, if n —oo, almost all of the estimators proposed previously must have the same results and 
no one can tell the difference between them. In order to evaluate the performance of an estimator, experiments and simulation 
have been widely used but lead to little result since there are too many random factors affecting the results obtained from 
experiments and simulations. 

in 0], 


To overcome the problem stated, simple and explicit estimators, such as that proposed 


are investigated that aims 


at reducing the complexity of an estimator and hopefully finding theoretical support for further development since a simple 
estimator may be easy to analyse. Using this strategy, the authors of 0 propose an explicit estimator that only considers a 
correlation, i.e. the correlation involving all descendants, and claim the same asymptotic variance for the estimates obtained 
by the estimator as that obtained by the estimator proposed in yt] to first order. The claim is obtained by applying the central 
limited theorem (CLT) on one of the results acquired by the asymptotic theory in yt], where the covariance between two 
descendants attached to the path of interest is obtained by assuming the loss rate of a link is very small and then the delta 
method is used to compute the asymptotic variance on the covariance matrix obtained by the asymptotic theory. The repeated 
use of the CLT makes the claim questionable and expensive to use in practice since the result only holds if n — c». Apart 
from that, some sensitive parameters are cancelled out in the process. It is easy to prove that under the same condition, most 
of the estimators proposed so far can achieve the same result, if not better, as that proposed in 0 . 

In contrast to 0 , 


m 


19|] propose an estimator that converts a general tree into a binary one and subsequently makes the 


likelihood equation into a quadratic equation of Ak that is solvable analytically. Experiments show the estimator preforms better 
than that in 0 si 

analysis to demonstrate why it is better than that proposed 


since the estimator uses more information in estimation. Except experimental results, there is little statistical 


and how to improve from there. Although the author of ||19|] 
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proves the estimator is a MLE, there is a lack of other statistical properties, such as whether the MLE proposed in il9|] is the 
same as that proposed in yj] and if not, how much difference between them. 

To be able to evaluate the performance of an estimator, we need to have the statistical properties of the estimator, such as 
unbiasedness, efficiency, variance, and so on, that differ from the asymptotic ones by showing the quality of an estimator in a 
hnite sample. To distinguish the properties from the asymptotic ones, we call them finite sample properties and there has been 
a lack of results for the finite sample properties. This paper aims to fill the gap and provides the properties. 


III. Assumption, Notation and Sufficient Statistics 

To make the following statistical analysis clear and rigorous, we need to use a large number of symbols that may overwhelm 
the readers who are not familiar with loss tomography. To assist them, the symbols will be gradually introduced through the 
paper, where the frequently used symbols will be introduced in the following two sections and the others will be brought up 
until needed. In addition, the most frequently used symbols and their meanings are presented in Table U for quick reference. 


A. Assumption 

We assume the probes multicasted from the source to receivers are independent and network traffic remains statistically 
stable during the measurement. In addition, the observation obtained at receivers is considered to be independent identical 
distributed (i.i.d.). Eurther, the losses occurred at a node or on a link are assumed to be i.i.d as well. 


B. Notation 

As stated, a network of the tree topology is considered in this paper and denoted by T = (V, E) that multicasts probes 
from the source to a number of receivers, where V = {uq, vi, ...Vm} is a set of nodes and E = {ei,..., emj is a set of directed 
links that connect the nodes in V. In addition, Vk, k S {1, m} is often called node k and Ck called link k in the following 
discussion. By default, node 0 is the root node of the multicast tree to which the source is attached. Apart from being the root 
that does not have a parent, node 0 is different from others by having a single descendant, ui, that is connected by ci. Among 
the nodes in V, there are a number of them called leaf nodes that do not have any descendant but a receiver is attached to a 
leaf node. Because of this, we do not distinguish between a leaf node and a receiver and we use R,RcV to denote them. 
Since there are m links to connect m + 1 nodes in T, the links and nodes are organised in such a way that if f{i) is used 
to denote the parent of node i, ei is the link connecting Vfi^i) to Vi. Eigure[T]is an example of a multicast binary tree that is 
named and connected according to the rules. 

A multicast tree, as a tree, can be decomposed into a number of multicast subtrees at node G 1/ \ (vg V R), where 
T{i) denotes the multicast subtree that has u/(i) as its root, as its root link, and R{i) as the receivers attached to T{i). In 
addition, we use dt to denote the descendants attached to node i that is a nonempty set if i ^ i?. If a; is a set, |a;| is used to 
denote the number of elements in x. Thus, \di\ denotes the number of descendants in di. Using the symbols on Eigure[T] we 
have R = {t;8,'y9, R{v 2 ) = {t^s,'yg, t^io, dy 2 = {'^ 4 , 1 ^ 5 }, and {dy^l = 2. 

If n probes are sent from vq to R in an experiment, each of them gives rise of an independent realisation of the passing 
(loss) process X. Let = l,....,n donate the i — th process, where x]. = l,k G V if probe i reaches Vk', otherwise 

xl, = 0. The sample Y = is the observation obtained in an experiment that can be divided into a number of 
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sections according to R{k), where Yfcjfc € V denotes the part of Y obtained by R{k). In addition, each of the sections can 
be further divided into subsections C dk that is the part of observation obtained by R{j),j € x A x C dk- Obviously, 

Yx CYk- If we use y* to denote the observation of receiver j for probe i, we have y* = 1 if probe i is observed by receiver 
j\ otherwise, y* = 0 . 

Although loss tomography aims at estimate the loss rate of a link, the pass rate of the path connecting uq to Ufc, fc G F is 
often used as the parameter to be estimated. Let Ak be the pass rate of the path connecting vg to Vk that is defined as the 
percentage of the number of probes arrived at node k among the number of probes sent by the source. Given Ak, k € V \ vo, 
we are able to compute the pass rates of all links in E since there is a bijection from the pass rates of the paths to the pass 
rates of the links in a network of the tree topology. If ak denotes the pass rate of link k we have 

Ak 


Otk = 


A 


f(k) 


( 1 ) 


Given ak, we are able to compute the loss rate of link k that is equal to = 1 — 


C. Statistics 

To estimate Ak from Y, we need a likelihood function to connect the i.i.d. model defined previously to Y. To support the 
initiative of using a part of the available correlations to estimate Ak, a function, nk{x),x C dk, defined as follows is used to 
return the statistic for the likelihood function: 


nk{x)=Y^ y y]. 


Obviously. 


i=i jeiJG) 

Z^X 


rikidk) = 51 V y^r 

i=l j^R{k) 


( 2 ) 


( 3 ) 


nk{x) is the number of probes, confirmed by the observation of R{i),i € x, reaching node k. If Nk is used to denote the 
number of probes reaching node k, we have Nk > nk{dk) > nk{x),x C dk, where nk{dk) and nk(x) are of statistics that 
can be used to estimate Ak- 

To write a likelihood function of Ak with nk{dk), fik and 7 ^ are introduced to denote the pass rate of the subtrees rooted at 
node k and the pass rate of the special multicast tree that connects vg to node k and then to R{k). Clearly Ak = Ak - fik,k &V 


nkidk) 


)(•?) „• 


and Ak = - - that is the empirical value of Ak- Note that 7 , = — - ,j G i? is the empirical pass rate of the path from 

n n 

the root to node j. Given the assumptions and definitions, the likelihood function of Ak for observation nk{dk) is written as 
follows: 

C{Ak,nk{dk)) = ( 4 ) 

We can then prove nk{dk) is a sufficient statistic with respect to (wrt.) the passing process of Ak for the observation obtained 
by R{k). Rather than using the well known factorisation theorem in the proof, we directly use the mathematic definition of a 
sufficient statistic (See definition 7.18 in 1^ 1 to achieve this. The definition wrt. the statistical model defined for the passing 
process is presented as a theorem here: 

Theorem 1: Let Yk = ...., be an i.i.d random sample, governed by C{Ak\Yk). The statistic nk{dk) is minimal 

sufficient for Ak in respect of the observation of Yk- 
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Fig. 1. A Multicast Tree 


Proof: According to the definition of sufficiency, we need to prove 


C[Ak\nk{dk) = t) 


^ — t) 

C{nk{dk) = t) 


( 5 ) 


is independent of Ak- 

Given (|4]i, the passing process with observation of nk{dk) = f is a random process that yields the binomial distribution as 
follows 

C{nk{dk) =t)= {AkPkYil - AkPkT-K 

Then, we have 


C[Ak\nk{dk) = f) 



{Y){AkPkyil-Akhr-^ 


( 6 ) 


which is independent of Ak- Then, nk{dk) is a sufficient statistic. 

Apart from the sufficiency, nk{dk), as defined in Q, is a count of the probes reaching R{k) that counts each probe once 
and once only regardless of how many receivers observe the probe. Therefore, nk{dk) is a minimal sufficient statistic in regard 
to the observation of R{k). ■ 


D. Statistics considering a part of observation 

Instead of using nk{dk) to estimate Ak, we can use nk{x),x C dk A |a;| > 2, defined in Section UlI-CI to estimate Ak- The 
difference between them is the number of correlations considered in estimation, where the latter is smaller than the former. 
As (|4]i, f3k{x),x C dk is needed to express the pass rate of the subtrees consisting of T{j),j € x. Given nk{x) and (3k{x), 
we can also write a likelihood function of Ak and use the same procedure as that in Section UlI-CI to prove nk{x) a sufficient 
statistic in the context of the observation obtained by R{j),j G x. Further, an estimator on the observation of R{j),j G x can 
be created that will be discussed in Section IVl 
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TABLE I 

Frequently used symbols and description 


Symbol 

Desciption 

T{k) 

the subtree rooted at link k. 


the descendants attached to node k. 

R(k) 

the receivers attached to T{k). 

Ak 

the pass rate of the path from uq to v^. 

dk 

the pass rate of the subtree rooted at node k. 

dki^) 

the pass rate of the subtree consisting of T{j),j £ x Ax G d^. 

'Ik 

^k * A’ P^ss rate from vq to R{k). 

Nk 

the number of probes reaching node k. 


the state of for probe i. 

^k 

the cr-algebra created from d^. 

n 

the number of probes sent in an experiment. 

'^ki^k'} 

the number of probes reaches R{k). 

nk(x) 

the number of probes reaches the receivers attached to T(j),j £ x. 

Ik{x) 

the number of probes observed by the members of x. 

Y 

the observation obtained in an experiment. 

Yk,kev 

the part of Y obtained by R(k). 

Yx,x <Z dk 

the part of Y obtained by R{j),j € x. 


IV. Estimator Analysis 

This section is dedicate to the analysis of the MLE that considers all of the correlations available in observation. By the 
analysis, we are able to identify all of the correlations in observation and find the connections among them that will set up 
the foundation for various explicit estimators. 

A. Maximum Likelihood Estimator based on nk{dk) 

Turning the likelihood function presented in (|4|i into a log-likelihood function, we have 

\ogC{Ak\Yk) = nk{dk) \og{Akl3k) + (n - n/c((ifc)) log(l - At/dt). (7) 

Differentiating (|7]i wrt. Ak and letting the derivatives be 0, we have 


nk{dk) {n - nk{dk))l5k 


Ak 


and then 


AkPk = 


1 Akfdk 

nk{dk) 


= 0 , 


( 8 ) 


(9) 


Since neither Ak nor jdk can be solved from (|9|l, we need to consider other correlations and then derive a MLE. Given the 
i.i.d. model assumed previously and the multicast used in probing, we have the following equation to link the observation of 
R{k) to Pk 


i-A= Ild-i)- 

jedk 


( 10 ) 
























( 11 ) 


Solving j3k from ( fTOl i and using it in ([8]l, we have a MLE as 

nkjdk) _ 

n ■ Ak 


1 - 


na 


jedk 


h.' 

Ak ' 


nk(dk) 

Using 7 fe to replace - - since the latter is the empirical value of the former, we have a likelihood equation as follows: 


that is identical to the estimator proposed in yfl. 




( 12 ) 


B. Predictor and Observation 


To make the correlations involved in (fTTI) visible, we expand the left hand side (LHS) and the right hand side (RHS) of 
where the terms obtained from the LHS are called observations and the terms from the RHS are called correlations. The 
correlation is also called the predictor since it predicates the observation received in an experiment. For instance, 7i-7j /Ak, i, j G 
dk Ai ^ j is the predictor of the probes simultaneously observed by the receivers attached to subtree i and subtree j, i.e. there 
is at least one receiver from each subtree. 

To represent the correlations involved in (fTTTi . a cr-algebra, Sk, is created over dk and let Efc = 5^ \0 be the non-empty sets 
in Sk- Each member in corresponds to a pair of a predictor and its observation. If the number of elements in a member of 
Efe is defined as the degree of the correlation, can be divided into |dfc| exclusive groups, one for a degree of correlations 
that vary from 1 degree to \dk\ degree. Let Sk{i),i G {I,--, |(ifc|} denote the group that considers i degree correlations. For 
example, if dk = {i,j,k,l}, Sk{2) = {{i, j), {i, k), {j, k), {k,l)} consists of the pairwise correlations in dk, and 

*S'fc(3) = {{i,j, k), {i,j, 1), {i, k, 1), {j, k, ))} contains all of the triplet-wise correlations. 

Given E^, nk{dk) can be decomposed into the probes that are observed simultaneously by the members of E^ that is defined 
as if a; € E^ and |a:| > 1, a probe observed by x if and only if at least a receiver attached to subtree j,j G x observes the 
probe. We call such an observation simultaneous observation. To explicitly express nk{dk) by nj{dj),j G dk, Ik{x),x G E^ 
is introduced to return the number of probes observed simultaneously by x in an experiment. Let u/ be the observation of 
R{j) for probe i that is defined as: 

u] = V 

keRU) 

then 

n 

ik{x) = Y./\^p 

i—l jGx 

If x= (j). 


Ik{x) = nj{dj),j G dk, 


Given the above, nk{dk) can be decomposed as: 


rikidk) ^ 4(x). (14) 

i=l xeSfe(i) 

(fT4l i states that nk{dk) is equal to a series of Ik{x),x G Sk{i) that are overlapped each other. To ensure each probe observed by 
R{k) is counted once and once only in nk{dk), we need to use the alternating adding and subtracting operations to eliminate 
duplication. 
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C. Correspondence between Predictors and Observations 

Given (O, we are able to prove the MLE proposed in yt] considers all of the correlations in Sfc and have the following 
theorem. 

Theorem 2: 1) (fTTT l is a full likelihood estimator that considers all of the correlations in 

2 ) (fTTT l consists of observed values and their predictors, one for a member of and 

3) the estimate obtained from (fTTI i is a fit that minimises an alternating differences between observed values and corre¬ 
sponding predictors. 

Proof: (fTTIi is a full likelihood estimator that considers all of the correlations in dk- To prove 2 ) and 3), we expand the 
both sides of (fTTIi to pair the observed values with the predictors of them according to Sk- We take three steps to achieve the 
goal. 

1 ) If we use (fTTI i to replace nk{dk) from LHS of (fTTI i. the LHS becomes: 


^ _ nkjdk) _ ^ _ 1 

n- Ak n- Ak 




X(-iri ^ ik{x)]. 

i—l x^Ski^) 


2) If we expand the product term located on the RHS of (fTTI i. we have: 


na 


h. 

Ak 


\dk\ T-r 

x£Sk{i) ^ 




l3 


(15) 


(16) 


i=l 


where the alternative adding and subtracting operations intend to remove the impact of redundant observation. 
3) Deducting 1 from both (fTSl l and (fTbll and then multiplying the results by Ak, (fTTIi turns to 

\dk\ T t \ 


E(-i)‘ E = E " 


j^x 


7j 


xeSkii) 


i-l 


(17) 


^=1 X^Ski^) i—l 

It is clear there is a correspondence between the terms across the equal sign, where the terms on the LHS are the observed 
values and the terms on the RHS are the predictors. If we rewrite (fTTI i as 


i=l xGSk{i) 


j&x 


l3 




= 0 , 


(18) 


the correspondence between correlations and observed values becomes obvious. 

To distinguish the MLE from those proposed in this paper, we call it original MLE in the rest of the paper. 


V. Explicit Estimators based on Composite Likelihood 

(fT^ shows that the original MLE takes into account all of the correlations in E^. If the number of subtrees rooted at node 
k is larger than 6 , the estimator is a high degree polynomial that could not be solved analytically. To have an explicit estimator 
in those circumstances, we need to reduce the number of correlations considered in estimation and there are a number of 
strategies to achieve this. We here propose three of them and use composite likelihood that is also called pseudo-likelihood by 
Besag in iQ to create likelihood functions for the strategies. The three strategies are named reduce scaled, block-wised, and 
individual based, respectively. The reduce scaled strategy, as named, is a small version of the original MLE that selectively 
removes a number of subtrees rooted at node k from consideration and then uses the maximum likelihood principle on the rest 
to estimate Ak- The block-wised strategy differs from the reduce scaled one by dividing all available correlations considered 
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by the original MLE into a number of blocks, one for a degree of correlations, from pairwise to d^-wise. The individual based 
one, in contrast to the other two, considers a correlation at a time that leads to a large number of estimators. 


A. Reduce Scaled Estimator (RSE) 

Rather than considering all of the correlations in E^, the correlations can be divided into groups according to the subtrees 
rooted at node k. Let x,x C dk be the group to be considered by an estimator in RSE. The log-likelihood function considering 
the correlations x is as follows: 


\ogL{Ak\Y^) = nk{x)\og{Akl3k{x)) + {n- nfc(a;))log(l - Ak(ik{x)) (19) 


where nk {x) as defined in IIII-DI is the number of probes reaching node k confirmed from the observations of the receivers 
attached to T{j),j G x and Pk{x) is the pass rate of T{j),j G x that can be expressed as 


i-/3fe(x)=n(i-^)- 

jGx 

Then, a similar likelihood equation as (fTTTi is obtained and presented as follows: 


1 -!:£(£)= na - 


n • Ak 


j^x 


If |a:| < 5, the equation is solvable analytically. The estimators in RSE are denoted by Amk{x),x C dk- 


( 20 ) 


( 21 ) 


B. Block-wised Estimator (BWE) 

(fTSl ) shows that the correlations involved in the original MLE can be divided into — 1 blocks, from pairwise to |(ifc|-wise. 
Each of them can be written as a likelihood function. In order to use a unique likelihood function for all of them, we let the 
likelihood function considering single correlation be 1. Then, the i-wise likelihood function denoted as Lc{i; Akiy) can be 
expressed uniformly. 

Definition 1: There are a number of composite likelihood functions, one for a degree of correlations, varying from pairwise 
to |(ifc|-wise. The composite likelihood function Ak\y),i G {2, |dfe|} has a form as follows: 


Lc{i\Ak\y) = 


n 


x^S{i 


XAkl3k{x)T'‘^^\l - 


i G {2,-,|dfc|} 


( 22 ) 


Let Ak{i) be the estimator derived from Lc{i; Ak;y). Then, we have the following theorem. 

Theorem 3: Each of the composite likelihood equations obtained from (l22l l is an explicit estimator of Ak that is as follows: 


A.(.) = (- 


E. 


hi- 


^y-\iG{2,-,\dk\}- 


(23) 


■ixeSkii) ^ 

Proof: Eirstly, we can write (l22l l into a log-likelihood function and differentiate the log-likelihood function wrt Ak- As 
(|9]l, we cannot solve Ak or I3k{x) directly from the derivative and we need to consider other correlations as ( fTOl i. We then 
have an equation as 


d^ogLc{i,Ak',y) _ ^ Ikjx) 


dAk 


x£S{i) 


qGx 




x' 


Ak 


qGx' 


(24) 
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The two summations can be expanded as (fTTT i and only the terms related to the i-wise correlation left since all other terms in 
the first summation are canceled by the terms of the second summation. The likelihood equation as ( l2Tt follows. ■ 

In the rest of the paper, Ak{i) is used to refer to the i — wise estimator and Ak{i) refers to the estimate obtained by Ak{i). 


C. Individual based Estimator (IBE) 

Instead of considering a block of correlations together, we can consider a correlation at at time and have a large number of 
estimators. Each of them has a similar likelihood function as (O, where f3k{x) and nk{x) are replaced by 'ijjk{x) and Ik{x), 
respectively. ipk{x) = Y\jex ctjPj^x C dk, is the pass rate of T{j),j G x. If EJ. = \ S'fc(l) is the correlations considered 

by IBE, the log-likelihood function for Ak given observation Ik{x) is equal to 

L{Ak\Ikix)) = Ik{x) log{Ak')pkix)) + {n- Ik{x)) log(l - Akipkix)), x G E'^.. (25) 


We then have the following theorem. 

Theorem 4: Given (l25l l. Akipkix) is a Bernoulli process. The MLE for Ak given Ik{x) equals to 


Al 


k{x) = [- 


Ik{x) 


.XGY.L 


(26) 


Proof: Using the same procedure as that used in IIV-AI we have the theorem. ■ 

Comparing (|2^ with (l26l l. we can find that Alk{x), where |a;| = i, is a type of geometric mean and Ak{i) is the arithmetic 
mean of Alk{x),x G Sk{i)- Therefore, Ak{i) is more robust than Alk{x). 

Using and combining the strategies presented here, we can have various explicit estimators that cover those proposed 
previously. Eor instance, the estimator proposed in [[Sl 

.. ..L,i 

IS a 


19l] is one of them that divides dk into two groups and only considers 

Q is 


the pairwise correlations between the members of the two groups. Therefore, although the estimator proposed in JS, 
MLE in terms of the observation used in estimation, it is not the same as (Ell. 


VI. Properties oe the Estimators 

It is known that if a MLE is a function of the sufficient statistic, it is asymptotically unbiased, consistent and asymptotically 
efficient. Thus, the original MLE and all of the estimators proposed in this paper have the properties. Apart from them, we 
are interested in whether some of the estimators have better properties than them, such as, unbiasedness, uniqueness, variance, 
and efficiency, that can be used to evaluate the estimators. This section is devoted to present them that consist of a number of 
theorems and corollaries. 


A. Unbiasedness and Uniqueness of Al{x) and Ak{i) 

This subsection is focused on the unbiasedness of the estimators in IBE and BWE although the statistic used by the latter 
is not minimal sufficient. For Alk{x),x G EJ,, we have the following theorem. 

Theorem 5: Alk{x) is a unbiased estimator. 

Proof: Let Zj^j G dk be the pass rate of T{j) and let Ak = ^ he the sample mean of Ak- Note that Zj and zi,j, I G dk 
are independent from each other if j 1. In addition, Zj,j G dk is independent from Ak. Because of this, a;^ Yij^x 
to replace V] following derivation since the latter is equal to Yij^x U] equal to x\ YIjgx 












12 


E{Alk{x)) 


n 

T-r Mil 1 . 


,N. 


E 




AT, Efii 

> ) 


Nk 


e{m) 


— z ■ 1 

^ 1 . Z^i=l i N tMt 


|xHl 


J_T-r 
Z^i=l ATt 11 


ATfe iijex i 


The theorem follows. 

Given theorem |5] we have the follow corollary. 
Corollary 1: Ak(i) is a unbiased estimator. 
Proof: According to theorem |5] we have 


E{M^) 


E 


{a,)e[{^ 


n l 

j&x L^i=l i jE[ 


SxGS(i) Si=l n 


Ijex i 



(27) 


(28) 


Given Alk{x),x € and Ak{i) are unbiased estimators, we can prove the uniqueness of Ak{i). 

Theorem 6 : If 

E n%< E 

xGSk{i)jGx x£Sk{i) 

there is only one solution in (0,1) for Ak{i), 2 <i<\dk\- 


Proof: Since the support of Ak is in (0,1), we can reach this conclusion from (|2^ . 


B. Efficiency of Aik(x), Amk{x), and the original MLE 

Apart from asymptotically efficiency stated previously for the MLEs using sufficient statistics, we are interested in the 
efficiency of the estimators proposed in this paper. Given (|25]) . we have the following theorem for the Fisher information of 
an observation, y, for the estimators in IBE, i.e. Alk{x),x G EJ.. 

Theorem 7: The Fisher information of y on Alk{x),x C dk is equal to 


f’kix) 


Ak{l - Akipkix))' 

Proof: Considering Ik{x) = y is the observation of the receivers attached to x, we have the following as the likelihood 


function of the observation: 

L{Ak\y) = y\og{Ak'fk{x)) + (1 -y)log(l - Ak'fki.x)). 

Differentiating (|2^ wrf Ak, we have 

dL{Ak\y) y {1 - y)'ipkix) 


dAk 


Ak 1 Ak'fkix) 


(29) 


( 30 ) 
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We then have 


d‘^L{Ak\y) _ _ y (1 - 

dAl Al {I - Akij^k{x)Y 

If I{Alk.{x)\y) is used to denote the Fisher information of observation y for Ak in Alk{x), we have 

I{Ahix)\y) = 

E{y) g(l 

(1 - ^feV'fc(a;))^ 

_ i’kjx) 

Ak{l - Aktpkix)) 


(31) 


(32) 


that is the information provided by y for Ak. ■ 

Given (|32] |. we are able to have a formula for the Fisher information of the original MLE and the estimators in RSE. In order 
to achieve this, let /3k{dk) = Pk- Then, we have the following corollary. 

Corollary 2: The Eisher information of observation y for A^. in the original MLE and Amk{x),x C is equal to 

A (A /I ^ 

Ak{l - AkPk{x)) 

Proof: Replacing nk{dk) or nk{x) by y and replacing n — nk{dk) or n — nk{x) hy 1 — y from d?) and ( fT9] l, respectively, 
and then using the same procedure as that used in the proof of theorem |7] the corollary follows. ■ 

Because of the similarity between (l32]) and (l32l l. the two equations have the same features in terms of support, singularity, 
and maximums. After eliminating the singular points, the support of Ak is in (0,1) and the support of Pk{x) (or 'tpkix)) is 
in [0,1]. Both (l32ll and (l33]l are convex functions in the support and reach the maximum at the points of Ak l,Pk{x) = 1 
(or (tpkix) = 1) and Ak 0,^k{x) = 1 (or (fjkix) = 1). Given Ak, (l3^ is a monotonic increase function of I3k{x) whereas 
dH is a monotonic increase function of '4>k{x). 

Despite the similarity between (l32l i and (l32t . Amk{x) and Alk{x) react differently if x is replaced by y, a; C y in terms of 
efficiency that leads to two corollaries, one for each of them. 

Corollary 3: Amk{y) is more efficient than Amk{x) x Gy. 

Proof: If X G y, Pkix) < Pk{y) and then we have the corollary. ■ 

Eor Alk{x), we have 

Corollary 4: The efficiency of Alk{x),x G EJ, forms a partial order that is identical to that formed on the inclusion of the 
members in E^, where the most efficient estimator must be one of the Alk{x),x G Sk(2) and the least efficient one must be 


Aik (dk). 

Proof: According to Theorem |7] the efficiency of Alk{x) is determined by tpkix), where ipkix) = Oj-gx ff x G y, 
we have 


My) = WajPj 

j£y 

= Mx) n 

jG{y\x) 

< i’kix) (34) 
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. Therefore, the order of the efficiency of Alk{x), a: G shares that of the inclusion in SJ., where x,x G S'fe(2) are the members 
of that have the minimal number of elements. In contrast to ipkix),x G 5^(2), ipkidk) < 4’k{x) since Vx, x G EJ,, x C dk- 
Then, the corollary follows. ■ 


C. Variance of Aik (x), Avrikix), and the original MLE 

The estimator specified by (fTSl i. Amk(x), and Alk{x) are of MLEs that have different focuses on the observations obtained. 
Despite the difference between them, they share a number of features, including likelihood function and efficient equation. In 
addition, the variances of them are expressed by a general function showing the connection between Ak and the pass rate of 
the subtree considered in estimation. Let mle denote all of them. Then, we have a theorem for the variances of the estimators 
in mle. 

Theorem 8: The variances of the estimators in mle equal to 


var{mle) = ^^(1 ^khix)) ^ ^ ^ 
Ok{x) 


(35) 


where Sk{x) 


Sk{x) = 


/3fc(x), For original MLE and Amk{x); 


^ipkix), For Alk{x). 

Proof: The passing process described by (l25T l is a Bernoulli process that falls into the exponential family and satisfies the 
regularity conditions presented in Q . Thus, the variance of an estimator in mle reaches the Cramer-Rao bound that is the 
reciprocal of the Fisher information. ■ 

(O can be written as 

Ak 


_ 

6k{x) 


(36) 


which shows: 

1) the estimates obtained by an estimator spread out more widely than that obtained by direct measurement. The wideness 
is determined by 5k{x), the pass rate of the subtrees connecting node k to observers. If 5k{x) = 1, there is no further 
spread-out than that obtained by direct measurement. Otherwise, the variance increases as the decreases of 6k and in a 
super linear fashion. 

2 ) the variance of the estimates obtained by an estimator is monotonically increasing as the depth of the subtree rooted at 
node k since the pass rate of a subtree decreases as its depth, i.e., the pass rate of an z-level tree, say A, is larger than 
that of the i + 1-level one that is extended from the i-level one; 

3) the variance of the estimates obtained by an estimator in mle is a monotonically decreasing function of Sk{x). 

The three points confirm some of the experiment results reported previously, such as the dependency of variance on topology 
reported in yt]. Note than despite 3), the variance of Amk{x) can be the same as that of Amk{y),x C y if (3k{x) = (3k(.y)- 
So does Alk{x). In other words, if the probes observed by R{j),j G {y\x) are included in that observed by R{i), i G x, the 
estimate obtained by Amk{x) is the same as that obtained by Amk{y). 
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D. Efficiency and Variance of BWE 

As stated, the estimate obtained by Ak{i) is a type of the arithmetic mean of Alk{x), x € Skii) that has the same advantages 
and disadvantages as the arithmetic mean. Thus, Ak{i) is more robust and efficient than that of Alk{x),x G S{i) since the 
former considers more probes than the latter in estimation although some of the probes may be considered more than once. 
Because of this, ( l32l i cannot be used to evaluate the efficiency of an estimator in BWE. Despite this, we can put a range for 
the information obtained by Akif) that is 


i’kix) 




X = 1. 


(37) 


Ak{l - Ak'fkix)) \i 

In addition, A{i) is at least as efficient as A{i + 1) and the variance of A{i) is at least as small as that of A{i + 1) since 

SxGSfe(z) ^kix) < Z]xGSfc(i+l) ^kix). 


E. Example 

We use an example to conclude this section that illustrate the differences of the variances obtained from the estimates of 
four estimators. The four estimators are; direct measurement, the original MLE, Alk(x),\x\ = 2 and Alk{dk), respectively. 
The setting used here is identical to that presented in H], where node k has three children with a pass rate of a,0 < a < 1, 
and the pass rate from the root to node k is also equal to a. Using dTSl ). we have the variances of them that are presented 
below: 

1) a — o?, 

3(l-a)+a^ ~ 

3 ) — — Qp- and 

4) ^ — Qf^. 

The difference between them becomes obvious as a decreases from 1 to 0.99, where the variances of the four estimators 
change from 0 to 0.01, 0.01, 0.03, and 0.04, re^ectively. The variance of Alk{dk) is 4 times of that of the original MLE that 


is significantly different from that obtained in 
the ratio between them remains. 


. Although the variances are decreased as the number of probes multicasted. 


VII. Model Selection and Simulation 

The large number of estimators in IBE, RSE and BWE, plus the original MLE, make model selection possible. However, 
to find the most suitable one in terms of efficiency and computational complexity is a hard task since the two goals conflict 
each other. Although one is able to identify the the most suitable estimator by computing the Kullback-Leigh divergence or 
the composite Kullback-Leigh divergence of the estimators, the cost of computing the Akaike information criterion (AIC) for 
each of the estimators makes this approach prohibitive. Nevertheless, the derivation of (l33]) successfully solves the problem in 
some degree since (l3^ shows the most suitable estimator should have a bigger fik{x) which can be obtained from end-to-end 
observation since (ik{x) oc rijexTi- 
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Estimators 

OMLE 

Afc(2) 

Afc(3) 

Alk{x), \x\ = 2 

Alk{x), |x| = 3 

samples 

Mean 

Vai- 

Mean 

Var 

Mean 

Var 

Mean 

Var 

Mean 

Var 

300 

0.0088 

I.59E-05 

0.0088 

1.59E-05 

0.0088 

1.64E-05 

0.0087 

1.59E-05 

0.0087 

L61E-05 

900 

0.0092 

7.76E-06 

0.0092 

7.82E-06 

0.0091 

7.84E-06 

0.0092 

7.90E-06 

0.0092 

8.15E-06 

1500 

0.0096 

4.55E-06 

0.0096 

4.55E-06 

0.0096 

4.80E-06 

0.0096 

4.78E-06 

0.0096 

4.33E-06 

2100 

0.0097 

3.14E-06 

0.0097 

3.11E-06 

0.0097 

3.14E-06 

0.0097 

3.02E-06 

0.0097 

3.08E-06 

2700 

0.0100 

1.72E-06 

0.0100 

1.72E-06 

0.0100 

1.74E-06 

0.0100 

l.SlE-06 

0.0100 

1.83E-06 


TABLE II 

Simulation Result of a 8-Descendant Tree with Loss Rate=1% 


Estimators 

OMLE 

Ak{2) 

Afc(3) 

Alk{x), \x\ = 2 

Alk{x), \x\ = 3 

samples 

Mean 

Var 

Mean 

Var 

Mean 

Var 

Mean 

Var 

Mean 

Var 

300 

0.0088 

I.59E-05 

0.0089 

1.64E-05 

0.0089 

L68E-05 

0.0091 

2.36E-05 

0.0088 

L95E-05 

900 

0.0091 

7.76E-06 

0.0091 

7.80E-06 

0.0091 

7.83E-06 

0.0092 

9.74E-06 

0.0091 

8.67E-06 

1500 

0.0096 

4.55E-06 

0.0096 

4.72E-06 

0.0096 

4.81E-06 

0.0097 

4.36E-06 

0.0096 

4.45E-06 

2100 

0.0097 

3.14E-06 

0.0097 

3.1 IE-06 

0.0097 

3.11E-06 

0.0098 

3.39E-06 

0.0097 

3.04E-06 

2700 

0.0100 

L72E-06 

0.0100 

1.69E-06 

0.0100 

L67E-06 

0.0101 

2.11E-06 

0.0100 

L90E-06 


TABLE III 

Simulation Result of a 8-Descendant Tree, 6 of the 8 have Loss Rate=1% and the other 2 have Loss Rate=5% 


A. Simulation 

To compare the effectiveness, robustness, and sensitivity of the estimators between the original MLE, and Ali^{x), three 

rounds of simulations are conducted in three settings. The multicast tree used in the simulations having a path/link from the root 
to node k that has 8 subtrees connecting to the receivers. Five estimators: the original MLE (OMLE), Ak{2), Ak{S), Alk{x), |a;| = 
2, and Alk{x), |a:| = 3, are compared against each other in the simulation. The number of samples used in the simulations 
varies from 300 to 2700 in a step of 600. For each sample size, 20 experiments with different initial seeds are carried out and 
the means and variances of the estimates obtained by the five estimators are presented in three tables, from Table |II] to Table 

lYl 

Table HJ is the results obtained in the first round that sets the loss rate of the subtrees to 1%, so does the loss rate of the 
path from the root to node k. The result shows when the sample size is small, the estimates obtained by all estimators are 
drifted away from the true value that indicates the data obtained is not enough. With the increase of sample size, the estimates 
gradually approach to the true value and all of the estimators achieve a similar outcome. As expected, the variances decrease 


Estimators 

OMLE 

Aki2) 

Ak{3) 

Alk{x), \x\ = 2 

Alk{x), \x\ = 3 

samples 

Mean 

Var 

Mean 

Var 

Mean 

Var 

Mean 

Var 

Mean 

Var 

300 

0.0503 

2.15E-04 

0.0504 

2.15E-04 

0.0505 

2.14E-04 

0.0508 

2.18E-04 

0.0505 

2.16E-04 

900 

0.0511 

5.85E-05 

0.0511 

5.81E-05 

0.0511 

5.79E-05 

0.0512 

5.79E-05 

0.0512 

5.88E-05 

1500 

0.0502 

2.24E-05 

0.0502 

2.24E-05 

0.0502 

2.23E-05 

0.0503 

2.33E-05 

0.0502 

2.32E-05 

2100 

0.0507 

L16E-05 

0.0507 

1.19E-05 

0.0507 

L20E-05 

0.0507 

L09E-05 

0.0507 

L13E-05 

2700 

0.0507 

1.31E-05 

0.0507 

1.34E-05 

0.0507 

L35E-05 

0.0508 

L35E-05 

0.0507 

L34E-05 


TABLE IV 

Simulation Result of a 8-Descendant Tree, the loss rate of the root link=5%, 4 of the 8 have Loss Rate=1% and the other 4 have 

Loss Rate=5% 
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as the sample size that is agreed with ( |35] ) and there is no significant difference among the estimators since all of the subtrees 
connected to node k have the same loss rates. Despite this, the variance of Alk{x), |a;| = 2 is slightly better than that of 
Alk{x), |a:| = 3 as specified by Theorem 0 

To compare the sensitivity and robustness, another round simulation is carried out on the same network. The difference 
between this round and the previous one is the loss rates of the subtrees connected to node k, where 6 of the 8 subtrees have 
their loss rates equal to 1% and the other two have their loss rates equal to 5%. The two subtrees considered by Alk{x), |x| = 2 
have the loss rates equal to 1 % and 5%, respectively; whereas the two of the three subtrees considered by Alk{x), |a:| = 3 
have their loss rates equal to 1% and the other has its loss rate equals to 5%. The results are presented in Table HIH Compared 
Table |III] with Table |II] there is no change for the original MLE and there are slight changes for the estimates obtained by 
Ak{2) and Afe(3). That confirms the robustness of the original MLE and the Ak{i) over Alk{x). In contrast to the original 
MLE and Ak{i), the variances of the estimates obtained by Alk{x), |a;| = 2 and Alk{x), |x| = 3 have noticeable differences 
with their counterparts, in particular if the sample size is small because each of the estimators has a descendant with a higher 
loss rate than that used in the first round. The advantage of this shows at the mean obtained by Alk{x), |a;| = 2 that approaches 
to the true value quicker than that in the first round and that of Alk{x), \x\ = 3. This is because one of the two descendants 
considered by Alk{x), \x\ = 2 has a higher loss rate than the other that increases the probability of matching the predicator 
to its observation. In contrast to Alk{x), |a;| = 2, the mean of Alk{x), |a:| = 3 has little change from that obtained in the first 
round. This reflects the tradeoff between efficiency and robustness among Alk{x), where the larger the |a;| is, the robuster the 
Alk(x) is to the turbulence of the loss rates in x. To have a similar result as the original MLE, we should select the subtrees 
that have loss rates equating to 1% for Alk{x), \x\ = 2 or 3. Then, the same result as that presented in Table HIl should be 
obtained. 

To verify the claim made at the end of last paragraph, we conduct another round simulation, where the loss rate of the path 
of interest is increased from 1% to 5%, and the loss rates of the eight subtrees rooted at node k are divided into two groups, 
four of them are set to 5% and the other four to 1%. The two estimators from IBE, i.e. Alk{x), \x\ = 2 and 3, consider the 
observations obtained from the subtrees that have their loss rates equal to 1%. The result is presented in Table HVl that confirms 
the estimates of Alk{x) can be as good as that of the OMLE. In comparison with Table HU there are two noticeable differences 
in Table |IV] ; 

• the means of the estimates approach to the true value quicker; and 

• the variances are a magnitude higher. 

The first can be derived from Theorem |7] and Corollary |2] since the efficiency of an estimator is inversely proportional to A^', 
whereas the second can be obtained from Theorem [ 8 ] that states a smaller 6k{x) results in a bigger variance. 

The simulations show that the original MLE undoultly is the most robust estimator that fits all of the three situations well 
although it reacts slower than some of the estimators proposed in this paper to the variation of observation. In contrast, there 
is always an estimator that has a similar performance as that of the MLE in each of the situations. The findings of this paper 
make it possible to identify a suitable estimator according to end-to-end observation. 
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VIII. Conclusion 

This paper starts from finding inspirations that can lead to efficient explicit estimators for loss tomography and ends with a 
large number of unbiased or asymptotic unbiased and consistent explicit estimators, plus a number of theorems and corollaries 
to assure the statistical properties of the estimators. One of the most important findings is of the formulae to compute the 
variances of estimated by the estimators in RSE, IBE and the original MLE. Apart from clearly expressing the connection 
between the path to be estimated and the subtrees connecting the path to the observers of interest, the formulae potentially 
have many applications in network tomography, some have been identified in this paper. Eor instance, using the formulae, we 
have ranked the MLEs proposed so far, including those proposed in this paper. In addition, the formulae make model selection 
possible in loss tomography and then the multicast used in end-to-end measurement is no longer only for creating various 
correlations but also for identifying the subtrees that can be used in estimation. The effectiveness of the strategy has been 
verified in a simulation study. Apart from those, there are other potentials to use the formulae and the findings that require 
further exploration. 
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